Identification of compositionally distinct regions in genomes using the centroid method
نویسندگان
چکیده
MOTIVATION It is known that most genomic regions of special interest, e.g. horizontally acquired sequences, genomic islands, etc. have distinct word (m-mer) compositions. Most of the earlier work along this direction, addressed di- and tri-nucleotide compositions. We present an approach that can be applied to analyze compositions of any given word size. The method, called the centroid approach, can reveal compositionally distinct regions in genomic sequences for any given word size. RESULTS We applied our method to 50 bacterial genomes and demonstrated its ability to identify embedded sequences of varying lengths from distantly related organisms. We also investigated the genetic makeup of the regions identified as compositionally distinct by our method, for four organisms from our dataset. Pathogenicity island (PAI) components and genes encoding strain-specific proteins are all frequently seen to be constituents of these regions. AVAILABILITY Program is available on request from the authors. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
Fuzzy Centroid-Based Method Applied to Customer Requirements Ranking in Diba Fiberglass Company
The purpose of this study is to introduce an application of fuzzy centroid-based approach to ranking the customer requirements using QFD with competition considerations for Diba Fiberglass, an Iranian Company. The illustrated approach, not only focuses on the normal fuzzy numbers, but also considers the non-normal fuzzy numbers to capture the true customer requirements. To this end, first, we p...
متن کاملAssessment of compositional heterogeneity within and between eukaryotic genomes.
Using large amounts of long genomic sequences, we studied the compositional patterns of eukaryotic genomes. We developed a simple measure, the compositional heterogeneity (or variability) index, to compare the differences in compositional heterogeneity between long genomic sequences. The index measures the average difference in GC content between two adjacent windows normalized by the standard ...
متن کاملInvestigating genomic structure using changept: A Bayesian segmentation model
Genomes are composed of a wide variety of elements with distinct roles and characteristics. Some of these elements are well-characterised functional components such as protein-coding exons. Other elements play regulatory or structural roles, encode functional non-protein-coding RNAs, or perform some other function yet to be characterised. Still others may have no functional importance, though t...
متن کاملIdentification and determination of different alkaloids from Atropa belladonna L. by Gas chromatography method
Background & Aim: A. belladonna (family: Solanaceae) is one of important pharmaceutical plants which contain tropane alkaloids. Tropane alkaloids are distinct group of secondary metabolites of the Solanaceae family. The most important alkaloids of A. belladonna are atropine and hyoscine that are used extensively because of their medicinal properties. There...
متن کاملSimple and Rapid Detection of Yersinia Pestis and Francisella Tularensis using Multiplex-PCR
Background: Yersinia pestis and Francisella tularensis cause plague and tularemia, which are known as diseases of the newborn and elderly, respectively. Immunological and culture-based detection methods of these bacteria are time-consuming, costly, complicated and require advanced equipment. We aimed to design and synthesize a gene structure as positive control for molecular detection of these ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 23 20 شماره
صفحات -
تاریخ انتشار 2007